Square root decomposition
Square root decomposition (also called sqrt decomposition or block decomposition) is a family of techniques that achieve per operation — or overall — by partitioning an array (or query set) into blocks of size and maintaining a summary for each block. It is simpler to implement than a Segment tree or Fenwick tree and is often the only feasible approach for problems where the per-element operation is too expensive for logarithmic data structures.
Description
The basic idea is to divide an array of elements into blocks of size . For each block we precompute a summary (e.g. block sum, block minimum). A range query then decomposes into:
- Up to two partial blocks at the two ends — touched element by element.
- complete blocks in the middle — answered in each via the precomputed summary.
This costs per query, minimized at giving
Range sum with point updates
The canonical implementation for range-sum queries with point updates:
struct SqrtSum {
int n, B;
vector<long long> a, block;
SqrtSum(vector<int>& v) : n(v.size()), B(max(1, (int)sqrt(n))),
a(v.begin(), v.end()), block(n / B + 1, 0) {
for (int i = 0; i < n; i++) block[i / B] += a[i];
}
void update(int i, int val) {
block[i / B] += val - a[i];
a[i] = val;
}
long long query(int l, int r) { // sum of a[l..r], inclusive
long long res = 0;
int bl = l / B, br = r / B;
if (bl == br) {
for (int i = l; i <= r; i++) res += a[i];
} else {
for (int i = l; i < (bl + 1) * B; i++) res += a[i];
for (int b = bl + 1; b < br; b++) res += block[b];
for (int i = br * B; i <= r; i++) res += a[i];
}
return res;
}
};
Both update and query run in . Precomputation is
Range updates with range queries (lazy blocks)
When we need range updates (add to every element in ) and range
queries, we keep a lazy addend lazy[b] for each complete block and a
separate block_sum[b] for the sum of the raw a[] values within that block.
Partial-block updates touch elements directly; complete-block updates only
increment lazy[b]
struct SqrtRange {
int n, B;
vector<long long> a, lazy, block_sum;
SqrtRange(int n) : n(n), B(max(1, (int)sqrt(n))),
a(n, 0), lazy(n / B + 1, 0), block_sum(n / B + 1, 0) {}
// add val to a[l..r]
void update(int l, int r, long long val) {
int bl = l / B, br = r / B;
if (bl == br) {
for (int i = l; i <= r; i++) { a[i] += val; block_sum[bl] += val; }
} else {
for (int i = l; i < (bl+1)*B; i++) { a[i] += val; block_sum[bl] += val; }
for (int b = bl+1; b < br; b++) lazy[b] += val;
for (int i = br*B; i <= r; i++) { a[i] += val; block_sum[br] += val; }
}
}
// sum of a[l..r]
long long query(int l, int r) {
long long res = 0;
int bl = l / B, br = r / B;
if (bl == br) {
for (int i = l; i <= r; i++) res += a[i];
res += lazy[bl] * (r - l + 1);
} else {
for (int i = l; i < (bl+1)*B; i++) res += a[i];
res += lazy[bl] * ((bl+1)*B - l);
for (int b = bl+1; b < br; b++) res += block_sum[b] + lazy[b] * B;
for (int i = br*B; i <= r; i++) res += a[i];
res += lazy[br] * (r - br*B + 1);
}
return res;
}
};
For a minimum query with range assignment updates, store a per-block minimum and replace the lazy addend with a lazy assignment flag; rebuild the block minimum only when touching boundary elements.
Complexity
For operations on elements: total time . In practice, tuning (e.g. or for ) and cache effects can matter.
Applications
- Range queries with expensive per-element operations. When the underlying operation doesn't easily compose (e.g. counting distinct values, XOR of sorted medians), a sqrt block can do it with brute-force per partial block and a maintained summary per complete block.
- Offline range queries — Mo's algorithm By sorting queries in a specific order derived from block assignments, all queries are answered in total using two pointers that expand/shrink a sliding window.
- Answering queries on static trees.Mo's algorithm on trees flattens a tree into an Euler-tour sequence and applies the same trick.
- √-time updates on a Segment tree / Fenwick tree Some problems mix arbitrary updates with structural queries; a sqrt layer can absorb recent updates in a buffer and flush them in bulk.
- Persistent / functional arrays. Sqrt decomposition over a version tree enables rollbacks without a fully persistent structure.
Variants
Mo's algorithm (offline range queries)
Mo's algorithm is a direct application of sqrt decomposition to offline queries. Sort queries by (block of , pairwise direction-alternating) so that when processing them in order the total movement of the two pointers is . Each step adds or removes one element from the current window, requiring work per element.
struct Mo {
int n, B;
struct Query { int l, r, idx; };
vector<Query> queries;
Mo(int n, int q) : n(n), B(max(1, (int)sqrt(n))) { queries.reserve(q); }
void add_query(int l, int r, int idx) { queries.push_back({l, r, idx}); }
// add / remove must maintain the current answer for [curL, curR]
template<class Add, class Rem, class Ans>
vector<typename result_of<Ans()>::type>
solve(Add add, Rem rem, Ans ans) {
sort(queries.begin(), queries.end(), [&](auto& a, auto& b) {
int ba = a.l / B, bb = b.l / B;
if (ba != bb) return ba < bb;
return (ba & 1) ? a.r > b.r : a.r < b.r;
});
using T = decltype(ans());
vector<T> res(queries.size());
int curL = 0, curR = -1;
for (auto& q : queries) {
while (curR < q.r) add(++curR);
while (curL > q.l) add(--curL);
while (curR > q.r) rem(curR--);
while (curL < q.l) rem(curL++);
res[q.idx] = ans();
}
return res;
}
};
Sqrt decomposition with rollback (Mo's algorithm with updates)
When queries also have point updates, a third parameter — time — is introduced. Queries become triples ; the block size for the time axis is also , giving total time or
Sqrt of a Segment tree (sqrt-bucket tree)
For problems that need both queries and updates per position but are too complex for a standard segment tree, one can build a two-level structure: the bottom level is plain arrays of size , and the top level maintains block summaries. Updates touch one block in ; queries scan blocks.
Static range minimum with sparse table fallback
When updates are absent, a Sparse table answers RMQ in per query after preprocessing, and is strictly better. Sqrt decomposition is the go-to when updates are present and per operation from a segment tree is either too slow (because of large constants) or overkill.
Problems
Problems are grouped by the idea each exercises. Selected entries include a solution sketch focused on how sqrt decomposition is used.
Basic range queries
- Range Sum Queries II (CSES) — range sum with point updates; a warmup where a Fenwick tree is faster but sqrt works.
- Range Minimum Queries (CSES) — range min with point updates.
- XOR on Segment (Codeforces 242E) — range XOR update, range sum query.
Solution sketch — XOR on Segment
Maintain blocks of size . For a range-XOR update
with value : apply element-by-element to the two partial blocks; for
each complete block, instead of touching every element, store a lazy XOR
lazy[b]. The block sum becomes (sum when no extra XOR) and
(sum XORed by ); toggling the lazy bit swaps them in .
A range-sum query accumulates element sums (offset by lazy) for partial blocks
and sum[b] for complete blocks. Both operations are
Offline range queries (Mo's algorithm)
- D-query (SPOJ) — count distinct values in a range.
- Powerful Array (Codeforces 86D) — sum of over all values in a range, where is the count.
- Curious Cupid (Kattis)
Solution sketch — Powerful Array (CF 86D)
Maintain a frequency array cnt[v] and the current answer ans. When element
is added, ans += 2*cnt[v] + 1, then cnt[v]++; when removed, cnt[v]--,
then ans -= 2*cnt[v] + 1. Each add/remove is , so Mo's algorithm
answers all queries in total.
Solution sketch — D-query
Maintain cnt[v] (count of value in current window) and distinct (number
of values with non-zero count). On add: if cnt[v] was , increment
distinct. On remove: if cnt[v] drops to , decrement distinct. Mo's
ordering makes the total pointer movement
Range updates + range queries (lazy sqrt)
- Ynoi2015 - 苦痛 — range add, range sum of squares; showcase of the lazy-block technique.
- Sqrt Queries (CSES) — range assign and range sum.
Solution sketch — Range assign, range sum
Each block stores a lazy assignment flag and the block sum. A range-assign
: for partial blocks, write the value element-by-element
and recompute the block sum (); for complete blocks, set lazy[b] = v
and block_sum[b] = v * B ( per block). A range-sum query: for partial
blocks, sum elements plus lazy if set; for complete blocks, add block_sum[b]
directly. Both are
Harder applications
- Tree and Queries (Codeforces 375D) — count values appearing times in a subtree range (Mo's on trees).
- Jeff and Removing Periods (Codeforces 351D) — offline; maintain a bitset of "active" elements.
- 2D queries (SPOJ IOPC1207) — 2D prefix sums; sqrt decomposition reduces update cost.
See also
- Mo's algorithm — offline range queries via sqrt block ordering
- Mo's algorithm on trees — applies Mo's to tree paths via Euler tours
- Segment tree — alternative when lazy propagation is feasible
- Fenwick tree — simpler structure for prefix-sum problems
- Sparse table — range minimum for static arrays

